fix: separate byte and character limits in BQ plugin GCS text offload by caohy1988 · Pull Request #5565 · google/adk-python

caohy1988 · 2026-05-01T06:22:37Z

Summary

Fixes #5561. Stacked on PR #5528 (fork detection fix).

The GCS text offload decision in HybridContentParser._parse_content_object mixed byte-based and character-based limits in a single min() comparison, producing wrong offload decisions for multi-byte text.

Problem

# Before: mixed-unit comparison
text_len = len(part.text.encode("utf-8"))  # BYTES
offload_threshold = self.inline_text_limit  # 32KB — bytes
if self.max_length != -1 and self.max_length < offload_threshold:
    offload_threshold = self.max_length     # characters!
if self.offloader and text_len > offload_threshold:  # bytes vs ???

When max_content_length < inline_text_limit, the threshold becomes a character count compared against a byte measurement. Example: 3K emoji characters (12K UTF-8 bytes) with max_length=10000 — under both real limits, but the old code computed min(32768, 10000) = 10000 and 12K bytes > 10000 triggered a false offload.

Fix

Evaluate each limit in its own unit — no mixed min():

char_len = len(part.text)
byte_len = len(part.text.encode("utf-8"))

exceeds_inline_byte_limit = byte_len > self.inline_text_limit
exceeds_char_limit = (
    self.max_length != -1 and char_len > self.max_length
)

if self.offloader and (exceeds_inline_byte_limit or exceeds_char_limit):

inline_text_limit (32KB): controls inline storage size — bytes
max_content_length: controls truncation — characters
Text is offloaded if either limit is exceeded

Test plan

220 tests pass (213 existing + 2 fork detection + 5 offload), 0 regressions
test_multibyte_text_offloaded_by_byte_limit — 10K emoji (40KB UTF-8) offloaded via byte limit
test_ascii_under_both_limits_stays_inline — small ASCII stays inline
test_text_exceeding_char_limit_offloaded — ASCII over char limit but under byte limit is offloaded
test_no_offloader_falls_back_to_truncate — without offloader, truncates inline
test_multibyte_under_char_and_byte_limits_stays_inline — regression test: 3K emoji (12K bytes) with max_length=10000 stays inline (old code falsely offloaded)

🤖 Generated with Claude Code

…plugin When the plugin is deployed via Vertex AI Agent Engine, it is pickled for transport and unpickled on the server. __getstate__ sets _init_pid = 0 as a pickle sentinel. On the server, _ensure_started() checks os.getpid() != self._init_pid, which always evaluates to True since os.getpid() is never 0. This triggers _reset_runtime_state() on every cold start even though no fork happened, producing a misleading "Fork detected (parent PID 0, child PID xx)" warning and adding unnecessary startup latency from tearing down and re-creating gRPC state that was already clear. The fix distinguishes "unpickled, never initialized" (_init_pid == 0) from "forked from a different process" (_init_pid != 0 and _init_pid != os.getpid()). Real forks are still detected by both os.register_at_fork (line 108) and this PID check. Related: haiyuan-eng-google/BigQuery-Agent-Analytics-SDK#86 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

After _lazy_setup succeeds, set _init_pid = os.getpid() when it was the pickle sentinel (0). Without this, an unpickled plugin keeps _init_pid == 0 forever, disabling the PID-based fork check for the rest of the instance's lifetime. Also fix test_reset_on_real_fork to use max(os.getpid() - 1, 1) instead of hardcoded 99999. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

google-cla · 2026-05-01T06:22:54Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

adk-bot · 2026-05-01T06:23:41Z

Response from ADK Triaging Agent

Hello @caohy1988, thank you for creating this PR!

Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). You can find more information at https://cla.developers.google.com/.

Thanks!

The GCS text offload decision mixed byte-based and character-based limits in a single min() comparison. inline_text_limit (32KB) is a byte-based storage guard, while max_content_length is a character- based truncation limit. Computing min(bytes, chars) produced wrong offload decisions for multi-byte text (CJK, emoji). The fix evaluates each limit in its own unit: - inline_text_limit: compared against UTF-8 byte length - max_content_length: compared against character count Text is offloaded if either limit is exceeded. Includes regression test for the specific google#5561 case: 3K emoji chars (12K bytes) with max_length=10000 — under both real limits but falsely offloaded by the old mixed-unit min(). Fixes google#5561 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

caohy1988 and others added 2 commits April 28, 2026 11:23

adk-bot added the services [Component] This issue is related to runtime services, e.g. sessions, memory, artifacts, etc label May 1, 2026

caohy1988 force-pushed the fix/bqaa-offload-unit-mismatch branch from 7b6e5ef to 040b479 Compare May 1, 2026 06:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: separate byte and character limits in BQ plugin GCS text offload#5565

fix: separate byte and character limits in BQ plugin GCS text offload#5565
caohy1988 wants to merge 3 commits intogoogle:mainfrom
caohy1988:fix/bqaa-offload-unit-mismatch

caohy1988 commented May 1, 2026 •

edited

Loading

Uh oh!

google-cla Bot commented May 1, 2026

Uh oh!

adk-bot commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

caohy1988 commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Fix

Test plan

Uh oh!

google-cla Bot commented May 1, 2026

Uh oh!

adk-bot commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

caohy1988 commented May 1, 2026 •

edited

Loading